Skip to content

Add port counter metrics to rdma health checks#62

Open
atoniolo76 wants to merge 5 commits intomainfrom
alessio/add-port-counter-metrics-to-rdma-health-checks
Open

Add port counter metrics to rdma health checks#62
atoniolo76 wants to merge 5 commits intomainfrom
alessio/add-port-counter-metrics-to-rdma-health-checks

Conversation

@atoniolo76
Copy link
Copy Markdown
Contributor

Adds port counter metrics to RDMA health-checks. On EFA, rdma_write_bytes and rdma_write_recv_bytes store the total # of transmitted and received bytes respectively. On GCP/OCI, port_xmit_data and port_rcv_data are the counters for transmitted and received words. We multiply by 4 to convert these to bytes then print out the deltas from before and after the bw tests execute.

I also added a simple printout of the cloud provider in modal_bw_ib.py.

Checklist

  • [*] Example is documented with comments throughout, in a Literate Programming style.
  • [*] Example does not require third-party dependencies to be installed locally
  • [*] Example follows the style guide
  • [*] Example pins its dependencies
    • [*] Example pins container images to a stable tag, not a dynamic tag like latest
    • [*] Example specifies a python_version for the base image, if it is used
    • [*] Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
    • [*] Example dependencies with version < 1 are pinned to patch version, ==0.y.z

(Modal's internal guide page for this repo is Multi-node examples guidance.)

Outside contributors

You're great! Thanks for your contribution.

@atoniolo76 atoniolo76 requested a review from pawalt March 4, 2026 21:31
@atoniolo76 atoniolo76 changed the title Alessio/add port counter metrics to rdma health checks Add port counter metrics to rdma health checks Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants